[newchem-cpp] Improve implementation of FrozenStringIdxBiMap#484
Open
mabruzzo wants to merge 32 commits intograckle-project:newchem-cppfrom
Open
[newchem-cpp] Improve implementation of FrozenStringIdxBiMap#484mabruzzo wants to merge 32 commits intograckle-project:newchem-cppfrom
FrozenStringIdxBiMap#484mabruzzo wants to merge 32 commits intograckle-project:newchem-cppfrom
Conversation
There is some unforunate consequences, but if we don't do this you end up with some pretty confusing looking code... (the confusion primarily arises if new_FrozenKeyIdxBiMap or drop_FrozenKeyIdxBiMap has different behavior from other functions with similar names). I really think it would be better to make this into a simple C++ class with a constructor and destructor, but thats a topic for another time
One minor change involved formally making it illegal to construct a GrainSpeciesInfo instance that holds 0 dust species. That's perfectly fine. `GrainSpeciesInfo` holds some redundant info that we will remove in the a subsequent commit
Modify tests to start using `FrozenKeyIdxBiMap` rather than `GrainSpeciesInfoEntry::onlygrainsp_idx` and `GrainSpeciesInfoEntry::name`. In the next commit, we'll remove these members from `GrainSpeciesInfoEntry`
Delete `GrainSpeciesInfoEntry::onlygrainsp_idx` and `GrainSpeciesInfoEntry::name`.
Also made a few assorted fixes
Both changes essentially affected 2 constants, which prior to this commit were known as `keylen_max` & `invalid_val`. The 2 contained constants are really just implementation details that users of the inferface for `FrozenKeyIdxBiMap` shouldn't need to know anything about
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This should be reviewed after #462 has been merged
Background Context
Recall that PR #451 introduced the
FrozenStringIdxBiMap. It is a datatype that implements a bidirectional map (aka a bidirectional dictionary), that can be used to map between a unique set of n strings (keys) and a unique set of indexes (with values of 0 through n-1) and vice-versa. In PR #451 I intentionally adopted an overly-simplistic implementation for the datatype (i.e. it just did linear searches over the full array) to make PR easier to review.Description
This PR totally changes the underlying implementation to improve efficiency. These changes have NO IMPACT on the interface. Under the hood,
FrozenStringIdxBiMapis now implemented in terms of a Hash Table.On a historical note, this PR effectively supersedes #270 (which originally proposed the same implementation)
Tradeoffs:
Benefit
This is clear-cut: average time complexity to search for a key goes from
Θ(n)(linear) toΘ(1)(constant)Cost
The storage allocated from the heap is larger. Suppose
sis the average number of bytes needed to track a string allocation andnkeys in a bimap:n*(8 + s)bytes (we just allocatenpointers)n*(2+16*k + s)bytesk * n * 16bytes for each row of the hash table and2*nbytes (akanstd::uint16_t values) for rapid reverse lookupkis the ratio between capacity & number of keys (a largerkreduces collisions). Currently, it's 2 (but we can reduce it!)Let's throw out some concrete numbers. Suppose we have 100 items (frankly, that seems a pretty high) and the average length is ~15 bytes
s=0)s=15)The fact that we are talking about less than 5 killobytes makes me think this cost isn't terribly significant (and, if it's a concern, we could try reducing
kfrom 2 to 1.5)In terms of absolute numbers, 5kB clearly isn't a concern (a linux page size is 4 kB). And if we ever refactor enough and use this in all the places I think it could be useful (e.g. using it to provide map-access to chemistry parameters or grackle fields, and for one or two cases to help with setup), we are talking about maybe creating 5 instances while initializing grackle (and most would promptly get de-allocated)
Other thoughts
Since this just slots into the interface defined by #451, there isn't a ton of urgency to this PR